Analyzing the Cost of a Cache Miss Using Pipeline Spectroscopy

نویسندگان

  • Thomas R. Puzak
  • Allan Hartstein
  • Philip G. Emma
  • Vijayalakshmi Srinivasan
  • Arthur Nadas
چکیده

We describe a new technique called Pipeline Spectroscopy that allows us to precisely measure the cost of each cache miss. The cost of a miss is displayed (graphed) as a histogram, which represents a precise readout showing a detailed visualization of the cost of each cache miss throughout all levels of the memory hierarchy. We call the graphs ‘spectrograms’ because they reveal certain signature characteristics of the processor’s memory hierarchy, the pipeline, and the miss pattern itself. We show that in a memory hierarchy with N cache levels (L1, L2, ..., LN, and memory) and a miss cluster of size C, there are C + N C possible miss penalties. This represent all possible sums from all possible combinations of the miss latencies with and without overlap from each level of the memory hierarchy (L2, L3, ... Memory) for a given cluster size. Additionally, a theory is presented that describes the shape of a spectrogram, and we use this theory to predict the shape of spectrograms for larger miss clusters. Next we provide to examples using spectroscopy to optimize the processor’s hardware or application’s software. The first example uses a miss spectrogram to improve the software design of an application. The second example uses a miss spectrogram to analyze bus queuing. Our experiments show that performance gains of up to 8% are possible. Detailed analysis of a spectrograph leads to much greater insight in pipeline dynamics, including effects due to prefetching, and miss queuing delays.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring The Cost Of A Cache Miss

It is vital that the cost of a cache miss be accurately measured in order for many hardware and software optimizations to occur. In this paper we describe a new technique, called pipeline spectroscopy, that allows pipeline delays to be monitored and analyzed in detail. We apply this technique to produce a cache miss ‘spectrogram’, which represents a precise readout showing a detailed histogram ...

متن کامل

DSTRIDE: Data-Cache Miss-Address-Based Stride Prefetching Scheme for Multimedia Processors

Prefetching reduces cache miss latency by moving data up in memory hierarchy before they are actually needed. Recent hardware-based stride prefetching techniques mostly rely on the processor pipeline information (e.g. program counter and branch prediction table) for prediction. Continuing developments in processor microarchitecture drastically change core pipeline design and require that existi...

متن کامل

Predictive Sequential Associative Cache

Traditionally, set-associative caches are implemented by comparing all blocks in a cache set in parallel for each reference and then selecting the desired block from the set. By providing more than one location for holding the data for a particular memory address, set associativity reduces the cache miss rate for most programs. The traditional solution is, however, not without cost. As contrast...

متن کامل

A multithreaded

This paper describes the microarchitecture of the RS64 IV, a multithreaded PowerPC processor, and its memory system. Because this processor is used only in IBM iSeries and pSeries commercial servers, it is optimized solely for commercial server workloads. Increasing miss rates because of trends in commercial server applications and increasing latency of cache misses because of rapidly increasin...

متن کامل

Architectural and implementation tradeoffs in the design of multiple-context processors

Multiple-context processors have been proposed as an architectural technique to mitigate the effects of large memory latency in multiprocessors. In this paper, we examine two schemes for implementing multiple-context processors. The first scheme switches between contexts only on a cache miss, while the other interleaves the contexts on a cycle-by-cycle basis. Both schemes provide the capability...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Instruction-Level Parallelism

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2008